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ABSTRACT 

Freehand line sketches are an interesting and unique form of visual 
representation. Typically, such sketches are studied and utilized as 
an end product of the sketching process. However, we have found it 
instructive to study the sketches as sequentially accumulated com¬ 
position of drawing strokes added over time. Studying sketches 
in this manner has enabled us to create novel sparse yet discrim¬ 
inative sketch-based representations for object categories which we 
term category-epitomes. Our procedure for obtaining these epitomes 
concurrently provides a natural measure for quantifying the sparse¬ 
ness underlying the original sketch, which we term epitome-score. 
We construct and analyze category-epitomes and epitome-scores for 
freehand sketches belonging to various object categories. Our anal¬ 
ysis provides a novel viewpoint for studying the semantic nature of 
object categories. 

Index Terms — sketch, object category recognition, sketch se¬ 
quence analysis, pattern recognition, epitome, sparse, minimalist 

1. INTRODUCTION 

Sketches, as a form of visual representation, exhibit a great variety 
from realistic portraits to sparsely drawn, stylistic ones. In particular, 
consider freehand (i.e. hand-drawn) sketches of objects. An instance 
of such a sketch can be seen in Figure[T] Though containing minimal 
detail, the object category to which it belongs is easily determined. 
This suggests an inherent sparseness in the human neuro-visual rep¬ 
resentation of the object. Therefore, studying such sparse sketches 
can aid our understanding of the cognitive processes involved and 
spur the design of efficient visual classifiers. 

Freehand fine sketches are typically formed as a composition 
of primitive hand-drawn curves (called strokes) added sequen¬ 
tially over time. A significant body of work has examined such 
sketches in the context of classification and content-based retrieval 
probIems|[T]||2(3l|3. In these problems, the end product of the 
sketching process, i.e. the fuff sketch, is typically considered in- 
toto. However, we believe it can be quite instructive to study the 
temporal process of sketch formation itself, starting with the first 
hand-drawn stroke until the last stroke which finalizes the sketch. 
Our belief originates in a surprising discovery we have made : For 
a given sketch, there exists a minimal discriminative subset of all 
its strokes which contribute to the sketch’s identity (category) being 
recognized consistently and correctly. We term the sketch composed 
using this minimal stroke subset as a category-epitome. Figure 
shows examples of freehand line sketches and their corresponding 
sparse category-epitomes. 

In this paper, we describe how these sparse category-epitomes 
are obtained. The category-epitome has a unique feature which sets 



Fig. I: In spite of minimal detail, we can recognize the line sketch 
easily and correctly as belonging to the category cup. 


it apart from other methods of sparse sketch generation: The pro¬ 
cess of epitome construction guarantees that the most fundamental 
strokes which enable category recognition (discrimination) are re¬ 
tained. To quantify the sparseness of an epitome, we provide a nat¬ 
ural measure termed epitome-score. We analyze the eptiomes and 
corresponding epitome-scores for freehand sketches across various 
object categories.Our analysis provides a novel viewpoint for study¬ 
ing the semantic nature of object categories. 

The rest of the paper is organized as follows: We briefly review 
related literature in Sectionj^ In Section[3 we describe the construc¬ 
tion of a sketch classifier. This sketch classifier plays a crucial role 
in obtaining category-epitome of a given sketch. In Section|4l we de¬ 
scribe how the category-epitome of a sketch is actually obtained and 
provide a simple, natural measure termed epitome-score to quan¬ 
tify its epitomal-ness. Section contains an analysis of category- 
epitomes and epitome-scores across object categories. Section 
concludes the paper by outlining some of the promising directions 
for future work. 

2. RELATED WORK 

In his seminal work a. Marr suggested an abstract representation 
called the primal sketch - a sparse, sketch-like representation of 
generic images in terms of image primitives. Inspired by his the¬ 
ories, methods for formalizing the notion of primal sketch have been 
proposed (6) and primal sketch representations have been used as 
features for object detectionO^ texture characterization 171 and for 
super-resolution^. Yet another line of research aims to generate 
sketches from one or more source images (4] (9) without explicit re¬ 
course to the idea of primal sketch. A common feature in all these 
works is the utilization of photographic images as the starting point. 
In contrast, our starting point is the sketch stroke data created by 
human beings. This provides a glimmer of hope that the sparse neu¬ 
rovisual representation of the object being sketched is transferred to 
the sketch in the process of drawing, at least in part. 

In addition to the artificial (i.e. not generated by human hand) 
nature of sketch generation, the works mentioned above do not at¬ 
tempt to quantify the sparseness of the resulting sketch. They also 
do not examine the temporal nature of sketch composition. In con- 
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Fig. 2: Original sketches (top row) and corresponding category-epitomes (bottom row) for various object categories. 


trast, recent work by Berger et. al. Go) analyzes the temporal as¬ 
pect of sketching in the context of mimicking artist style. However, 
their emphasis is on synthesizing abstract facial sketches rather than 
recognition. Moreover, their sketches are produced by professional 
artists. In contrast, our sketches have been generated by crowdsourc¬ 
ing from the general public. The idea of identifying and utilizing a 
discriminative subset of strokes was employed by Karteek et al. CD 
for classifying online handwritten data. Another work close in spirit 
to ours is the unpublished, but publicly available manuscript of Jiang 
et al.CD which examines the temporal evolution of sketches from a 
visualization perspective. Finally, the work of Eitz et al. CD exam¬ 
ines how humans tend to draw objects by analyzing a large number 
of sketches spread across commonly encountered object categories. 
We employ their database of sketches in this paper. 

3. BUILDING THE SKETCH CLASSIFIER 

3.1. The sketch database 

The sketches used in our study have been taken from the publicly 
available freehand line sketch database of Eitz et al. lfT3l . This 
database contains a curated set of 20, 000 hand-drawn sketches 
evenly distributed across 250 object categories. As mentioned be¬ 
fore, these sketches have been obtained by crowdsourcing across 
the general population. As such, they are a good starting point for 
analyzing the underpinnings of the sketching process by humans. A 
few examples from the database can be seen in the top row of Eigure 
12 The dominant appeal of this database is that the temporal stroke 
information (the sequential order in which the strokes were drawn) 
for a sketch has been provided. As we will see in Section |4] it is 
precisely the temporal stroke information that forms the basis for ob¬ 
taining the category-epitome and the corresponding epitome-score 
for sketches. 

3.2. Sketch data augmentation 

The database of Eitz. et. al. contains 80 sketches per category. To 
increase the number of sketches per category available for training 
the classifier (Section l3.4i . we perform data augmentation by ap¬ 
plying geometric and morphological transformations to each sketch. 
Specifically, each sketch is initially subjected to image dilation us¬ 
ing a 5 X 5 square structuring element. A number of transforms are 
applied to this thickened sketch - mirroring (across vertical axis), 
rotation (±5, ±15 degrees), combinations of horizontal and vertical 
shifts (±5, ±15 pixels), central zoom (±3%, ±7% of image height). 
As a result, 30 new sketches are generated per original sketch. The 
data augmentation procedure results in 2400 sketches per category, 
for a total of 600, 000 sketches across 250 categories. 


3.3. Sketch feature extraction 

As the top row of Eigure|2demonstrates, the spatial density of sketch 
strokes can be quite small. Moreover, extracting typical image fea¬ 
tures (e.g. based on texture) is ruled out and edge information is 
quite sparse. Nevertheless, a number of alternatives have been pro¬ 
posed as sketch features. For a survey of these methods in the context 
of sketch-based image retrieval, refer to m. We extract Histogram 
of Oriented GradientsfHOG II141 )-like sketch descriptors using the 
pipeline described by Eitz et. al. CD- The collection of descriptors 
so obtained are then combined using Eisher image representation 
approachCD to obtain a feature vector for each sketch. 


3.4. Sketch Classification 

As an initial exploration and for ease of analysis, we consider only 
the first 50 alphabetically sorted sketch categories. We build a sketch 
classifier by utilizing 80% of the augmented sketches(Section 13.2b 
from each category for training and the rest for testing. Doing so 
provides 2400 x 0.8 x 50 = 96, 000 sketches for training and 80 x 
0.2 X 50 = 800 sketches for testing 

From each test sketch, its Fisher feature vector!Section [±2 > is 
obtained. A multi-class Support Vector Machine (SVM) classifier 
employing a Radial Basis Function (RBF) kernel was trained on the 
Fisher feature vectors by employing 5-fold cross validation and grid- 
based parameter search. The accuracy of the resulting classifier was 
60.25%. For context, the accuracy (for the same split ratio of train¬ 
ing and test sketches) obtained by Eitz et. al. CD is 54% with the 
caveat that the number of categories are larger than ours - 250. 

In the overall scheme of things, the sketch classifier is only use¬ 
ful to the extent that it lets us determine the category-epitomes. Seen 
in this light, the accuracy of the classifier assumes secondary im¬ 
portance. However, a classifier with good performance is still desir¬ 
able as it ensures a larger coverage of the test set. In addition, such 
a classifier could potentially help obtain sparser category-epitomes 
compared to a counterpart whose performance is relatively poor. 
Nevertheless, to progress towards our goal of determining category- 
epitomes, we settle for the existing performance of the sketch clas¬ 
sifier. In the next section, we shall see how the sketch classifier is 
actually utilized in constructing the category-epitome. 


'Testing is done only on the original sketch subjected to dilation and not 
on its subsequently transformed variants generated for data augmentation. 
Hence the factor of 80 for each test category instead of the full 2400 as in 
training. 
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Fig. 3: Constructing the category-epitome for an airplane sketch: The sketch has 9 strokes. Si — Sg are the cumulative stroke sequence 
canvases. A red cross mark indicates a misclassification(O) while a green tick mark indicates correct classification(l). Canvas 54 outlined by 
a cyan rectangle is the category-epitome. Note that even though canvas S 2 is classified correctly, we consider category-epitome as canvas 54, 
the temporally earliest, correctly classified canvas whose successors are classified correctly as well. 


4. OBTAINING THE CATEGORY-EPITOME 

4.1. Constructing Cumulative Stroke Sequences 

As the first step in determining the category-epitome, we construct 
sequences of cumulative strokes derived from correctly classified 
test sketches. Suppose the sequence of strokes in the temporal or¬ 
der they were drawn in a test sketch is given by 5 = {si, S 2 •. • Siv} 
where N is the total number of strokes in the sketch. To construct the 
corresponding cumulative stroke sequence, we begin with a blank 
canvas. Strokes from the given sketch are successively added to the 
blank canvas in the temporal order. As each stroke is added, interme¬ 
diate canvases Si, S 2 , ■ ■ ■, Sn are created. Specifically, the interme¬ 
diate canvases are given by 5i = {si}, S 2 = {si, S 2 },. •., 5jv = 
{si, S 2 ,..., Sn}. At the end of this process, we obtain the cumula¬ 
tive stroke sequence CSS = {5i,..., Sn}. Figure[3illustrates the 
creation of cumulative stroke sequence for a sketch from airplane 
category. 

4.2. Coustructiug the category-eptiome 

Having generated the cumulative stroke sequence for a sketch as de¬ 
scribed above, the category-epitome can be constructed. Using the 
classifier from Sectionj^ each intermediate canvas of the cumulative 
stroke sequence is classified to obtain a binary labeling - the label 
is 1 if the sketch category is correctly identified and 0 otherwise. 
Thus, we obtain a binary label sequence £ = {Zi, Z 2 , .. •, In} corre¬ 
sponding to each canvas of the cumulative stroke sequence C55(see 
Figure[3. 

Note that the final canvas Sn corresponds to the original test 
sketch since all the strokes have been added to the canvas at that 
point. Therefore, the final classification label In must be 1 since we 
are working with correctly classified test sketches. Now, consider 

the product sequence V = {Pi, P 2 .FW} formed by cumulative 

multiplication of labels U £ C,i = 1,2 ... N: 


as well. Using the example in Figure the classification label se¬ 
quence is given by £ = {0,1, 0, 1,1,1,1,1,1}. From Equation (UJ, 
the product sequence is computed as P = {0, 0, 0,1,1,1, 1,1,1}. 
From Equation ([^, we obtain e = 4. In other words, canvas 54 (out¬ 
lined by a cyan rectangle in Figure O corresponds to the category- 
epitome: the temporally earliest, correctly classified canvas whose 
successors S 5 ... Ss are classified correctly as well. Figurel^shows 
sketches from various categories and their corresponding category- 
epitomes. 

4.3. Epitome-score : Quantifying the category-epitome 

Our procedure for obtaining the category-epitome, described above, 
also provides a natural method for quantifying the “epitome”-ness 
of the original, full sketch. Using e obtained from Equation we 
define the epitome-score £ of a sketch as : 


e = 


#, 6/1 

0, e = 1 


(3) 


where N is the total number of strokes in the sketch, e = 1 
corresponds to the situation where merely drawing the first stroke 
conveys the epitome-ness of the sketch. Therefore, we have defined 
the corresponding epitome-score £ to be 0 for consistency across 
sketches. Our definition of epitome-score £ essentially conveys the 
sparseness underlying the sketch - the smaller its value, the more 
sparser the sketch is likely to be. Epitome-scores very close to 1, on 
the other hand, indicate that very few strokes in the original sketch 
are redundant. Eortunately, very high epitome-scores are not the 
norm, as we will see - sparsity is pervasive across categories. Refer¬ 
ring once again to Figure]^ the epitome-score for the airplane 
sketch is computed as £ = | = 0.44. 

5. ANALYSIS 


P^ = Y[h 


( 1 ) 


Then, the category-epitome corresponds to canvas 5e of the cu¬ 
mulative stroke sequence such that 


e = min {i\Pi = 1} 

l<i<JV 


( 2 ) 


Informally, the category-epitome 5e is the temporally earliest, 
correctly classified canvas whose successors are classified correctly 


5.1. Epitome-scores across categories 

We begin our analysis by computing the median of epitome-scores 
for sketches on a category-by-category basis. Figure |4] displays the 
median epitome-scores (shown as filled circles in the figure) for test 
sketches across 50 object categories. It is heartening to observe that 
42%(21/50) of the categories have median epitome-scores below 
0.5 and 80%(40/50) of the categories have scores below 0.75. 
These trends suggest an inherent sparseness for visual represen¬ 
tation of object categories because the smaller the epitome-score, 
the sparser the category-epitome sketch. If we examine the scores 
closely, the epitome-scores for some of the categories (apple and 
boomerang) are 0. For these categories, the sketches (see Figure 



































Fig. 4: Median epitome-scores (y-axis) and corresponding error bars for 50 object categories(x-axis). The standard errors are clamped to 
[0,1] - the range of epitome-scores. The median scores are shown as filled circles. 



Fig. 5 : Demonstrating the effect of budgeting epitome-score on cate¬ 
gory recognition rate using data from 10 selected categories. Values 
of various possible epitome-score thresholds are shown on x-axis. 
The % of images in each category which exceed a particular value 
of the threshold are shown on y-axis. The 10 category names are 
shown adjacent to their respective plots. 


and normalize by the number of test sketches in the category. To 
facilitate analysis and avoid visual clutter, we select 4 prototypical 
categories whose plot contours occupy the extremitiesftowards top- 
left comer and bottom-right corner) of the original plot. In addition, 
we select 6 other prototypical categories whose plot contours lie be¬ 
tween the extremity plots previously mentioned. The resulting plot 
can be viewed in Figure 

Epitome-score can be considered as a proxy for semantic level- 
of-detail. Viewed in this light, the plot from Figure suggests the 
varying epitome-score budgets across categories - some categories 
require a considerable level of detail before their epitomal avatars are 
revealed (e.g. categories with plots towards the lower right such as 
bridge and bicycle). On the other hand, categories with plots 
towards the upper left comer (apple, boomerang) have relatively 
less stringent demands on level of detail. 

To view a larger set of category-epitomes, t-SNEQH visu¬ 
alizations of epitomes for select categories and additional re¬ 
sults which support the analysis presented in this section, visit 
http://val.serc.iisc.ernet.in/sketchepitome/se.html 


13 are typically drawn such that a dominant stroke (an oval in the ver¬ 
tical plane with a notch at the top for the apple and the chevron-like 
contour for the boomerang) essentially captures the epitome-ness of 
the category(See also Equation[3- 

The varying lengths of error bars in Figure |4] indicates that 
some object categories may have multiple representative category- 
epitomes rather than a single, unique representative. For example, 
the sketches of the category (human) arm are likely drawn with a 
fairly consistent appearance by humans. This consistency influences 
the sketch classifier and can cause it to produce category-epitomes 
which exhibit a minor amount of variations, thus resulting in a com¬ 
pact distribution of epitome-scores (shorter error bars). In contrast, 
the variety in sketches of a category such as carrot is reflected in 
the corresponding category-epitomes and by extension, in the longer 
error bars of its epitome-scores. 

5.2. Epitome-score as a proxy for semantic level of detail 

A different perspective can be gained by examining categories for 
a given value of epitome-score. For each category, we count the 
number of test sketches whose epitome-score exceeds a threshold 


6. FUTURE WORK 

A number of directions exist for future work. One obvious di¬ 
rection would be to improve the performance of existing sketch 
classifier in terms of number of categories as well as accuracy. A 
well-performing classifier which utilizes fewer training samples 
translates to a potentially larger set of test category-epitomes for 
analysis. We also intend to compare our current distribution of 
cfltegory-epitomes/epitome-scores and related analysis with that ob¬ 
tained when human subjects are asked to identify the cumulative 
stroke sequences. For this comparison, we plan to use the bench¬ 
mark sketch database created by Rosalia et al. im who employ 
a human-evaluation based technique to identify a subset of 160 
non-ambiguou|3 object categories from the 250 originally provided 
by Eitz et alfT3. However, as the number of categories increases, 
visualizing trends (Figures |4] - 15]( becomes a challenge. Therefore, 
novel visualization methods need to be explored or alternately, the 
number of categories needs to be curated in a meaningful manner 
for representative sketch analysis. 

^Rosalia et al. (m define a non-ambiguous sketch as one whose identity 
is agreed upon by at least 2 people among the subjects suiweyed. 
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